Dominant Folding Pathways of a (5 Hairpin 



Pietro Faccioli*, 1 ' 2 Alice Lonardi, 1,3 and Henri Orland 4 

1 Dipartvmento di Fisica, Universitd degli Studi di Trento, 
Via Sommarive 14, Povo (Trento) Italy, 1-38100. 
2 I.N.F.N., Gruppo Collegato di Trento, Via Sommarive 14, Povo (Trento) Italy, 1-38100. 
Dipartimento di Scienze e Tecnologie Chimiche, Universitd di Roma Tor Vergata, 
Via della Ricerca Scientifica 1-00133 Rome, Italy 
Institut de Physique Theorique, Centre dEtudes de Saclay, CEA, IPhT, F-91191, Gif-sur-Yvette, France 

We use the Dominant Reaction Pathway (DRP) approach to study the dynamics of the folding of 
a (3 hairpin, within a model which accounts for both native and non-native interactions. We compare 
the most probable folding pathways calculated with the DRP method with those obtained directly 
from molecular dynamics (MD) simulations. We find that the two approaches give completely 
consistent results. We investigate the effects of the non-native hydrophobic interactions on the 
folding dynamics found them to be small. 



I. INTRODUCTION 

The theoretical investigation of the microscopic dy- 
namics driving the protein folding reaction remains a 
very challenging open problem. In general, numerical 
simulations based on molecular dynamics (MD) are very 
inefficient for this purpose. The reason is that most of 
the computational time is wasted to simulate the thermal 
oscillations in the (meta)-stable states, while the relevant 
information about the folding mechanism is encoded in 
the reactive trajectories which connect denatured and na- 
tive configurations. 

In order to overcome these difficulties, alternative ap- 
proaches have been developed which yield directly the 
folding pathways, without investing time in simulating 
the exploration of metastable states pQ, [2 [21 SI IS] [BJ [7]. 
In particular, the DRP method [5113 IHUH] has the advan- 
tage to yield directly the most statistically relevant reac- 
tion pathways in Langevin dynamics, and to sample the 
reaction using equal configurational displacement steps, 
rather than equal time steps. This way, it is possible to 
characterize a folding transition using only a few tens of 
path discretization steps. The DRP approach has been 
recently extended to predict the dynamics of both elec- 
tronic and nuclear degrees of freedom during thermally 
activated reactions, from quantum mechanical calcula- 
tions in the Born-Oppenheimer approximation |10j . 

The DRP method is in general very computation- 
ally efficient compared to " brute" -force MD simulations. 
However, there are several potential limitations which 
have to be taken into account. First, it cannot predict 
the structure of the native and denatured configurations, 
but only determine the dynamics of the conformational 
transitions which connects given input initial and final 
configurations. While for many proteins the structure 
of the native state has been accurately determined from 
X-ray cristallography or NMR experiments, much less 
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FIG. 1: Upper panel: the 16-residue C-terminus of protein 
G-Bl (PDB code 2gbl). Lower panel: its coarse grained rep- 
resentation used in the present study. 



is usually known about the conformational structure of 
polypeptide chains in the denatured state. A commonly 
adopted strategy to generate model unfolded conforma- 
tions is to run high temperature MD unfolding simula- 
tions. Such an approach is certainly appropriate to study 
the folding of molecules which have been denatured by 
temperature jumps (see e.g. |16j and references therein). 

A second limitation which is common to all approaches 
which focus on the reaction pathways pQ, 0, [3 H], 
O [7] is represented by the computational difficulty of 
performing an exhaustive exploration of the transition 
path space. Problems arise from the fact that the fold- 
ing pathways which are sampled depend on a very large 
number of degrees of freedom. In addition, the explo- 
ration of the path space is slowed down by the intrinsic 
ruggedness of the energy landscape. In view of such diffi- 
culties, the question has been raised whether the folding 
trajectories obtained in such approaches can be consid- 
ered realistic representation of the folding reaction. 

Finally, an important question which remains to be 
answered is how many of such reactive trajectories are 
needed in order to completely characterize the dynam- 
ics of a folding transition. Indeed, if the set of differ- 
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ent pathways along which a protein can fold is exponen- 
tially large, then any theory which allows to simulate 
only a relatively small number of folding trajectories will 
be useless. In this case, one must necessarily rely on 
macroscopic descriptions, based on the dynamics of suit- 
ably chosen reaction coordinates — see e.g. [TT] — , or 
on the kinetic transitions between thermodynamic states 
[T21 IT51 RH] . By contrast, if the evolution of arbitrary 
order parameters during the folding reaction can be ac- 
curately inferred from, say, a few tens of simulated folding 
trajectories, then microscopic approaches based on com- 
puting folding trajectories in configuration space become 
useful. 

In view of these potential limitations, in [5] a study of 
the accuracy of the DRP approach in predicting protein 
folding trajectories was performed using an off-lattice Go- 
type model of a 16-residue polypeptide chain. It was 
shown that, by averaging over a handful of dominant re- 
action trajectories — each one corresponding to a differ- 
ent initial condition in the denatured state — the region 
of lower free energy connecting the native and denatured 
state could be accurately located. 

In principle, it is not guaranteed that the DRP ap- 
proach remains reliable when one considers more sophis- 
ticated models, characterized by a higher degree of frus- 
tration. Indeed, more complicated energy functions may 
significantly increase the space of statistically significant 
protein folding pathways making it more difficult or even 
unfeasible to identify the physically important pathways. 
It is therefore important to test the DRP method using 
models in which the landscape is corrugated. The sim- 
plest way to do so is to consider a coarse-grained model 
in which also non-native interactions are taken into ac- 
count 1 . 

In this work we implement a Go-type coarse-grained 
model for the polypeptide chain considered in [5], in 
which non-native interactions are also included, based 
on the hydrophobic or polar character of the residues. 
We compare directly the folding pathways extracted from 
long MD simulations, with the corresponding dominant 
folding trajectories, obtained in the DRP approach. 



II. RESULTS AND DISCUSSION 

We have used the coarse-grained model defined in sec- 
tion |III| to investigate the folding dynamics of the 16- 
residue C-terminus of protein GB1. In the native state 
of the intact protein, this terminus assumes the structure 
shown in the upper panel of Fig. [T] NMR experiments in- 
dicate that, in aqueous solution and ordinary thermody- 
namic conditions, the structure of the hydrophobic clus- 
ter is preserved also in the isolated protein terminus |15j. 
Fig. [2] shows the time evolution of the fraction of native 
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FIG. 2: Time evolution of the fraction of native contacts in 
a MD simulation based on the coarse-grained model used in 
this work, at the temperatures T = 200 K and T = 300 K. 



contacts 2 obtained from 10 4 steps of MD simulations (us- 
ing Brownian dynamics) at the temperatures T — 200 K 
and T = 300 K. These results show that the equilibrium 
configurations generated by coarse-graiend model form 
two states: a native state (with x n > 0.6) and a dena- 
tured state (with x n < 0.3). At T = 200 K (T = 300 K) 
the equilibrium population is predominantly native (de- 
natured) . 

The dominant reaction trajectories calculated with the 
DRP approach described in section |III| were used to ob- 
tain predictions for the average time evolution of observ- 
ables, using Eq. ([3]). In particular, in this work we con- 
sidered the dynamics of the following set of order param- 
eters: 

• The distance between residues 1 and 16, di_i6(r): 
the parameter was defined to be native for <ii_i6 < 
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2 Two residues are considered in contact if their distance is less 
than 0.6 nm. 
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0.6nm and denatured for G?i_i6 > l.lnm 
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FIG. 3: Time evolution of the order parameters di-ie (upper 
panel), gLi_i2 (second panel from the top) Gh (third panel 
from the top) and x n (lowest right) as a function of the frac- 
tion of total folding time. Squares and circles denote the 
results of the DRP with and without non-native contacts, 
respectively. Dashed and dotted lines represent the results 
of MD trajectories obtained using a potential energy function 
with only native contacts and with both native and non-native 
contacts, respectively. 



• The distance between residues 4 and 12, d4_i2(r): 
the parameter was defined to be native for d^\i < 
0.4nm and denatured for G?4_i2 > 0.8nm 

• The radius of gyration of the hydrophobic clus- 
ter formed by the residues TRP, PHE and TYR, 
Gh(t): the parameter was defined to be native for 
Gh < 0.45nm and denatured for Gh > 0.55nm 

• The fraction of native contacts x n (r): the param- 
eter was defined to be native for x n > 0.7 and de- 
natured for x n < 0.3. 

The results of our DRP calculation at T = 300 K are 
presented in Fig|3j where they are compared with the 
results obtained by evaluating the same order parame- 
ters along the folding pathways calculated directly from 
long MD simulations, in the same model and at the same 
temperature. Since the friction coefficient 7 is a free pa- 
rameter of the present model, whose only role is to set 
the total time scale, we plotted the results as a function 
of the total transition time r/t. The model denoted with 
"Go" corresponds to one in which only native interac- 
tions are included, while "Go-HPN" refers to a model 
in which non-native hydrophobic and hydrophilic effects 



are taken into account as well (see section 111 for further 
details) . 

These results show that, at least for the system under 
consideration, the DRP method quantitatively predicts 
the average dynamics of all the order parameters con- 
sidered, during the folding transition. This represents a 
clear evidence that the method is predictive even if a rel- 
atively small number of dominant paths are considered 
and even if the energy landscape is quite rugged. 

Interestingly, we find that the structure of the folding 
pathways is not significantly altered, when non-native 
interactions are removed. A small discrepancy between 
the two models seems to emerge only for the dynamics of 
the distance between the two THR residues at position 
4 and 12, in the chain (second panel from the top). The 
evolution of such a parameter is the result of the competi- 
tion between the effective hydrophilic repulsion between 
these residues, and their tendency to collapse, which is 
generated by the presence of a hydrophobic cluster near 
by, formed by the TRP, PHE and TYR residues. In the 
initial phase of the folding the burying of the hydropho- 
bic cluster overrules the effective hydrophilic repulsion 
between the THR residues, and the distance di_ 12 falls 
shorter in the Go-HPN model than in the simple Go- 
model. However, in the last stage of the folding the two 
THR come very close together and the hydrophilic effect 
tends to keep them more separated than in the Go-model. 
The crossover between the hydrophobic dominated stage 
and the hydrophilic dominated stage of the reaction oc- 
curs approximatively at two-thirds of the total folding 
time. This is an example of the type of dynamical infor- 
mation which is made accessible by the DRP approach. 
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The overall agreement between Go and Go-HPN calcula- 
tions represents an evidence in support of the reliability 
of our analysis of the folding mechanism for this systems, 
based on Go-type models — see e.g. [TB] — . 

To summarize, in this work we have tested the accu- 
racy of the DRP method, by comparing directly its pre- 
dictions for the average time evolution of a set of order 
parameters during the folding of a /3-hairpin, with the 
folding trajectories obtained directly from molecular dy- 
namics simulations, within in the same model. We found 
that the two approaches give results which are completely 
consistent, within the statistical errors. 

We have analyzed the role of non-native interactions 
in determining the structure of the folding pathways. 
We found that in order to accurately predict the time 
evolution of observables it is sufficient to average over 
a relatively small set of dominant paths, each one cor- 
responding to a different initial condition. Furthermore, 
we found that the method remains reliable even when the 
underlying energy landscape generated by the potential 
energy is quite corrugated. 



• Ai 



3 and Bij 



T, for paris in which one of 



the amino-acids is polar (P) 



• Ay = 1 and Bij 



if one of the residues is GLY, 
which is hydrophobically neutral (N) 

In the following, we shall refer to the model in which the 
Gij, Aij and B^ coefficients are defined this way as to 
the Go-HPN model. Clearly, by setting all — 1 and 
Bij = one recovers the minimally frustrated Go-model 
used in [5]. 

As a concluding remark for this section, we stress that 
the coupling B^ and Gy do not represent disentangled 
physical effects, since the stability of the native structure 
is known to be largely influenced by the hydrophobic ef- 
fect. Hence, such an effect is encoded implicitly also in 
the structure of the contact map Gy . The rationale for 
adding also the B^ coefficients to the potential energy is 
to account for non-native hydrophobic interactions and 
to increase the frustration of the model. By switching 
on and off the B^ terms it is possible to increase the 
ruggedness of the energy landscape, and to study how 
this affects the structure of the folding pathways. 



III. METHODS 

A. Definition of the Coarse-Grained Model 

We adopt a coarse-grained representation of such a 
poly-peptide chain, in which the explicit degrees of free- 
dom are beads which describe the single amino-acids — 
see the lower panel of Fig. [T] — . The energy function of 
this model is assumed to be the sum of pair-wise inter- 
actions: 



L/(R) = i]Tfc(|r fc+1 -r fe |-a) 2 



IE. 



Ai 



12 



(Gij + B^) 



(1) 



The first term provides chain connectivity, where 
a = 0.38 nm represents the average distance be- 
tween two consecutive a-carbons on the chain and k = 
3000 kJ moZ _1 nm~ 2 is the elastic constant of the har- 
monic spring. The strength of the Lennard- Jones at- 
traction is set by the parameter e = 4 kJ mol -1 , while 
a = 0.3 nm represents an effective residue size. Gij is 
the matrix of native contacts, i.e. Gy is set to 1 if the 
distance between the residues i and j in the native confor- 
mation is less than 0.65 nm, and otherwise. The coeffi- 
cients A^ and Bij introduce residue specificity based on 
the hydro-philic and hydro-phobic characters of the in- 
dividual amino-acids. In analogy with the so-called HP 
model [T5], they are defined as follows: 

• Aij = 1 and B^ = 1 , for pairs in which both amino- 
acids are hydrophobic (H) 



B. The Dominant Reaction Pathways Approach 

Let us now briefly review the DRP approach — for a 
detailed presentation see e.g. [7] — . Let R be a point 
in configuration space for the macromolcculc under con- 
sideration. In this particular case, R is defined by the 
coordinates of all 16 residues R = (tx, ... ,i"i 6 ). We as- 
sume that the dynamics of the protein in solution obeys 
the over-damped Langevin Eq.: 



R 



-W(R) + 77(i), 



(2) 



where rj(t) are usual Gaussian noise functions, obeying 
the fluctuation-dissipation relationship, and 7 is the vis- 
cosity coefficient, which is inversely proportional to the 
diffusion coefficient D, 7 = (with — -^Kp).The ac- 
celeration term, which appears in the original Langevin 
Eq., can be shown to be negligible for time scales larger 
than a fraction of ps. 

The time evolution of an arbitrary configuration- 
dependent observable O(R) during a folding transition 
lasting a time t is given by: 

dR" / dR'e a h D (R') 

h N (R") / PRO(R(t)) e Jq V 4 

(3) 

where P(t) is the (un-normalized) probability to fold in 
the time interval t: 



P(t) 



dR" J dR'h D (R')h N (R")e- 



ff(E/(R")-[/(R)) 
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(4) 

r is an intermediate instant during the folding, i.e. < 
t < t, and V e ff(H) is called the effective potential, de- 
fined as 

V eff (R) = i- ((V£/(R)) 2 - |v 2 L/(R)) . (5) 

/id(R') and ftjv(R"') in Eq.s ^ and Q are the char- 
acteristic functions of the native and denatured state, 
respectively: = 1 if R is a configuration in the 

native (denatured) state, and otherwise. 

The DRP approach is based on the saddle-point ap- 
proximation of the path integrals in ([3]). The idea is 
to consider only the folding trajectories with the largest 
probability, i.e. those for which the "action" functional 

Seff = J dr' 1 - R 2 (r') + K//[R(r')] (6) 

is minimum. A dramatic simplification is obtained upon 
observing that the Eq.s of motion generated by the ef- 
fective "action" S e ff are energy-conserving and time- 
reversible. These properties allows us to switch from 
the tame-dependent Newtonian description to the energy- 
dependent Hamilton-Jacobi (HJ) description. We note 
that this could not be done at the level of the original 
Langevin equation. 

In the HJ framework, the most probable (or so-called) 
dominant pathways connecting a given denatured con- 
figuration Rd and a given native configuration R„ is ob- 
tained by minimizing numerically — e.g. via simulated 
annealing- a discretized version of the target function 
(HJ functional) 

Shj = [ dl^/(E eff + V eff [R(l)}), (7) 

where dl is an infinitesimal displacement along the path 
trajectory — see e.g. [7] for further details — . E e ff is a 
free parameter which determines the total time elapsed 
during the transition, according to: 

tn ~ td= L dl ^(E eff+ % ff mm- (8) 

In [9] it was shown that the longest transition time 
for a single folding event follows from choosing E e ff = 
— V e ff(x]y). However, such a choice generally leads to 
a very low acceptance rate, when one uses a global 
minimization algorithm such as simulated annealing to 
find the dominant paths. The reason is that, for most 
trial moves, the HJ action becomes complex. To avoid 
this problem, in this work we consider shorter transi- 
tion times: we begin the minimization of the HJ action 



using a very large effective energy, E e ff = V e //(x]y) + 
alK/ztxisr)!, with a — 1000. The parameter a is then 
gradually reduced to a = 10, during the minimization. 

We emphasize that the time interval t = t n — t ( i is 
not the mean-first passage time from the denatured to 
the native state. It is the time it takes to fold, once the 
transition has been initiated. In other words, the DRP 
formalism is concerned only with the dynamics in the re- 
active part of the trajectories and not with the dynamics 
of exploration of the native and denatured states. 



C. Numerical Implementation of the DRP 
Approach 

The numerical minimization of the HJ functional re- 
quires the choice of a set of native and denatured config- 
urations. In addition, one needs to define a correspond- 
ing set of trial trajectories connecting the native and de- 
natured configurations, from which the numerical mini- 
mization of the HJ functional is initiated. A sample of 
native configurations can be easily obtained from short 
MD simulations at room temperature, starting from the 
experimentally known native structure. In the present 
work, we generated the denatured configurations and the 
corresponding initial trial paths from 30 MD unfolding 
simulations at T = 2000 K, starting from the determined 
native configurations. The number of frames in such tra- 
jectories were decimated to 23 by averaging the residue 
coordinates over blocks of consecutive frames, as in |5j. 
The search for the dominant reaction pathways start- 
ing from each of such trial trajectories was performed by 
applying 100,000 steps of simulated annealing. At each 
step, the update of the system's configuration was made 
according to the following prescription: in each frame of 
the path, a global trial move was performed using either 
global cartesian shifts of all residue coordinates, or by 
rotating part of the molecule around one of the bonds 
(i.e. the pivot algorithm [H]). The probability of per- 
forming a move based on the pivot algorithm was larger 
for the first frames — which correspond to denatured 
configurations — while the cartesian moves were statis- 
tically favored in the frames corresponding to the latest 
stage of the folding, where residues are packed and moves 
based on the pivot algorithm would have a lower accep- 
tance rate. The boldness of the moves was adapted to 
keep the global acceptance rate around 50%. The be- 
havior of the HJ action during a typical minimization is 
given in Fig. [4] 

Ideally, the resulting dominant pathways should be in- 
dependent on the choice of the initial trial paths. In prac- 
tice, any global minimization algorithm allows to explore 
only some functional neighborhood of the initial condi- 
tions. Hence, for the DRP to work, the high-temperature 
unfolding trajectories must not be too different from the 
folding pathways at room temperature. 
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FIG. 4: Typical evolution of the HJ action during the numeri- 
cal minimization based on the simulated annealing algorithm. 
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